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Abstract  7 

This  paper  presents  a  scheme  using  the  virtual  machine  concept  for 

creating: 

1)  An  environment  for  Increasing  the  effectiveness  of  researchers  who 

must  use  analytical,  modeling  systems  and  have  complex  data  management  needs. 

2)  A  mechanism  for  multi-user  coordination  of  access  and  update  to  a 

central  data  base. 

3)  A  mechanism  for  creating  an  environment  where  several  different 
modeling  facilities  can  access  the  same  data  base. 

4)  A  mechanism  for  creating  an  environment  where  several  different  and 
potentially  incompatible  data  management  systems  can  all  be  accessed  by 

the  same  user  models  or  facilities. 

5)  A  mechanism  for  reducing  the  transport  cost  of  integrating  existing 
analytical  packages  and  applications  programs  under  a  single  unified 
system. 

6)  A  mechanism  for  enhancing  the  data  management  capabilities  of 
existing  modeling  and  anlaytical  languages. 

Also  presented  is  a  theoretical  analysis  of  the  performance 
implications  of  this  scheme  specifically  directed  at  the  question  of  response 
time  degradation  as  a  function  of  number  of  virtual  machines,  of  locked 
time  of  the  data  base  machine,  and  of  query  rate  of  the  modeling  machine. 
A  discussion  of  the  practical  implications  of  this  anlaysis  is  given. 


Application  Areas  Addressed 

There  exists  a  number  of  applications  that  demand  a  computational 
facility  having  interactive  data  management  capabilities  as  well  as  having 
flexible  analytical  capabilities.  The  analytical  capabilities  demanded 
include  facilities  for  statistical  analysis,  a  broad  spectrum  of  modeling, 
econometric  modeling,  dynamic  simulation,  cross  sectional  anlaysis, 
optimization,  input/output  modleing,  graphic  presentation,  and  interpre- 
tation of  results.  These  applications  include  information  systems  for 
assisting  public  policymakers,  e.g.,  analyzing  energy  problems,  as  well 
as  the  general  area  of  resource  management.  Other  examples  include  infor- 
mation systems  in  medical  research  as  well  as  prototyping  of  corporate  decision 
support  systems.  Most  present-day  computer  systems  are  concerned  with  opera- 
toinal  type  problems.  For  example,  computing  payrolls,  taxes,  bills,  etc. 
The  class  of  application  above  demands  a  computational  facility  to  support 
policy  decision  making.  What  is  different  about  the  problems  associated 
with  this  class  of  applications  is  that: 

-  problems  are  not  known  long  in  advance; 

-  problems  keep  changing;  and 

-  solutions  are  needed  in  a  short  time-frame. 

Hence,  the  research  reported  here  has  been  to  develop  a  scheme  for 

rapid  definition  and  development  of  a  specific  data  maangement  and  analysis 

a 
structure  designed  to  meet  needs  as  perceived  at/particular  point  in  time, 

but  which  are  known  to  be  evolving.  Thus  our  emphasis  is  on  techniques 
for  accommodating  different  data  base  and  analysis  systems  in  one  inte- 
grated framework.  Rather  than  to  force  the  conversion  and  transport  of 
application  systems  to  one  operating  system,  we  advocate  the  use  of 
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virtual  machine  concept  by  which  a  virtual  machine  monitor  controls  the 
timesharing  of  the  resources  of  a  single  physical  machine  among  different 
operating  systems.  Netowrk  algorithms  can  be  implemented  to  permit  communi' 
cation  of  data  between  the  "seemingly  incompatible"  oeprating  systens. 
Thus  with  such  a  system  configuration  it  is  possible,  for  example,  to 
access  data  from  a  data  management  system  available  under  one  operating 
system  for  use  in  an  analysis  program  available  under  another  "seemingly  i 
incompatible"  operating  system. 

Hence,  users  of  such  systems  can  build  upon  existing  programs  and 
not  be  required  to  be  retrained  in  languages  as  tools  they  do  not 
presently  know. 

To  fulfill  the  demands  of  these  applications,  we  propose  using  the 
concepts  of  simulating  many  computers  onto  one  computer  and  have  all 
these  simulated  computers  interconnected  to  one  common  virtual  machine 
executing  an  interactive  data  management  system.  Each  of  the  other 
simulated  computers  may  run  a  different  modeling,  simulation,  or  i 
analytical  package  even  though  they  may  run  under  incompatible  operating 
systems.  Hence,  users  are  not  required  to  learn  a  new  analyatical  capa= 
bility.  Eacf  computer  may  run  any  existing  model  or  program  with  no 
transfer  costs. 

Such  a  configuration  eliminates  the  need  to  devote  resources  to 
transporting  application  langauges  and  programs  between  operating 
systems  and  permits  interaction  between  application  languages  and  programs 
not  originall/  envisioned  by  their  developers.  For  example,  an  anlaytical 
package  has  its  datamanagement  capabilities  greatly  enhanced. 
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Overview  of  System  Architecture 

Using  this  concept,  M.I.T.'s  Center  for  Information  Systems  Research 
and  the  M.I.T.  Energy  Laboratory  have  developed  a  system  of  interconnected 
simulated  computers.  The  system  is  called  GMIS  (Generalized  Management 
Information  System)  ["GMIS:  An  Experimental  System  for  Data  Management  and 
Analysis",  John  J.  Donovan  and  Henry  D.  Jacoby,  M.I.T.  Energy  Laboratory 
Working  Paper  No.  MIT-EL-75-011WP]  and  has  been  applied  to  a  number  of 
energy  decision  making  systems,  e.g.,  the  New  Enlgand  Energy  Management 
Information  System  (NEEMIS)  ["NEEMIS:  Text  of  Governors'  Presentation, 
M.I.T.  Sloan  School  Center  For  Information  Systems  Research  Report  CISR-18]. 
Much  of  the  software  has  been  developed  with  the  assistance  of  IBM  under  an 
M.I.T. /IBM  Joint  Study  Agreement. 

It  is  not  the  purpose  of  this  paper  to  describe  GMIS.  Rather,  GMIS  is 
used  in  this  sectionas  an  operational  illustrative  example  of  these  con- 
cepts. 


Currently  GMIS  is  implemented  on  an  IBM  System/370  computer.  It  uses 
the  Virtual  Machine  (VM)  concept  extensively.   A  virtual  machine  may  be 
defined  as  a  replica  of  a  real  computer  system  simulated  by  a  combination 
of  a  Virtual  Machine  Monitor  (VMM)  software  program  and  aporopriate  hard- 
ware support.  For  example,  the  VM/370  system  enables  a  single  IBM  System/370 
to  appear  functionally  as  though  it  were  multiple  independent  System/370's 
(i.e.,  multiple  "virtual  machines").  Thus,  a  VMM  can  make  one  computer 
system  function  as  though  it  were  multiple,  physically  isolated  systems. 

A  configuration  of  virtual  machines  used  in  GMIS  is  depicted  in 
Figure  1,  where  each  box  denotes  a  separate  virtual  machine.  Those  vir- 
tual machines  across  the  top  of  the  figure  are  e^cecuting  programs  that 
provide  user  interfaces,  whether  they  be  analytical  facilities,  existing 
models,  or  data  base  systems.  All  these  programs  can  access  data  managed 
by  the  general  data  management  facility  running  on  the  virtual  machine 
depicted  in  the  center  of  the  page.  A  sample  use  of  this  architecture 
might  proceed  as  follows.  A  user  activates  a  model,  say  in  the  APL/EPLAN 
machine.  That  model  requests  cata  from  the  general  data  base  machine 
(called  the  Tran: action  Virtual  Machine,  or  TVM),  which  responds  by  passing 
back  the  requested  data.  Note  that  all  the  analytical  facilities  and  data 
base  facilities  nay  be  incompatible  with  each  other,  in  that  they  may  run 
under  different  operating  systems. 


The  VM  concept  is  presented  in  several  places  [Parmelee,  1972;  Madnick  and 
Donovan,  1974;  and  Goldberg,  1973],  and  many  of  its  advantages  are  articu- 
lated elsewhere  [Madnick,  1969;  Buzen  et.  al . ,  1973].  The  concept  of 
"virtual  machines"  has  been  developed  by  IBM  to  the  point  of  a  production 
system  release,  VM/370  [IBM,  1972]. 
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Figure  1:   Overview  of  the  Software  Architecture  of  GMIS 


The  communications  facility  between  virtual  machines  is  incorporated 
in  the  program's  Multi  User  Interface.  The  implementation  of  this  conmuni- 
cations  facility  is  described  more  fully  in  [Gutentag  75,  Donovan  and 
Jacoby  75].  Essentially  what  is  needed  is  a  means  of  passing  commands  and 
data  to  the  data  base  machine,  returning  data,  and  a  locking  and  queueing 
mechanism.  One  way  to  pass  data  is  to  use  virtual  card  readers  and  card 
punchers.  The  data  base  virtual  machine  would  be  in  wait  state  trying  to 
read  a  card  fron  its  virtual  card  reader,  the  analytical  machine  would 
punch  the  commands  on  the  virtual  card  reader  that  would  be  read  by  the 
data  base  VM.  This  mechanism  is  inefficient,  however,  and  does  not  allow 
flexible  processing  algorithms. 

The  mechanism  implemented  in  GMIS  is  as  follows  (note  that  this  mech- 
anism may  be  invisible  to  a  modeler).  Each  user  virtual  machine  (UVM), 
which  is  accessed  by  logging  on  to  a  separate  account  ID  under  VM/370,  sends 
transactions  to  the  Transaction  Virtual  Machine  through  a  communications 
facility  shared  files  and  virtual  card  punchers  and  readers.  The  Multi 
User  Interface  (MUI)  stacks  these  transaction  requests  and  processes  them 
one  at  a  time,  "he  results  of  each  transaction  are  passed  back  to  the 
virtual  machine  '.hat  made  the  request  through  the  same  communications 
facility.  Replies  to  the  transactions  may  be  processed  with  any  software 
interface  that  is  required  for  the  application.  Extensions  to  this  archi- 
tecture to  allow  interfaces  to  other  data  base  systems  and  other  computer 
systems  are  discussed  in  [Donovan  and  Jacoby,  1975]. 

GMIS  software  has  been  designed  using  a  hierarchical  approach  [Madnick,  1975, 
1970;  Dijkstra,  1968;  Gutentag,  1975].  Several  levels  of  software  exist, 
where  each  level  only  calls  the  levels  below  it.  Each  higher  level  con- 
tains increasingly  more  general  functions  and  requires  less  user  sophis- 
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tication  for  us;.     The  transaction  virtual  machine  depicted  in  Figure  1 
shows  only  two  of  these  levels,  the  Multi-User  Interface  and  SEQUEL 
[Chamberlain,  l'^74].     The  data  base  capabilities  of  this  machine  are  based 
on  the  relation<il  view  of  dSta  [Codd,  1970]. 

3,  Analysis  of  Performance 

The  construction  of  a  system  of  communicating  VM's  bring  the  previously 
mentioied  advantages,  but  these  come  at  the  expense  of  some  sacrifice  in 
performance.  Various  performance  studies  of  VM's  are  available  in  the 
literature  [Hatfield,  1972,  Goldberg,  1974],  we  report  here  on  a  theoretical 
analysis  and  in  the  next  section  of  the  practical  implications  of  the 
degradation  of  variable  cost  performance  as  a  function  of  the  number  of 
modeling  machines.  The  direction  of  this  work  can  be  seen  by  considering  a 
configuarion  as  in  Figure  1,  where  several  modeling  facilities,  each  running 
on  a  separate  virtual  machine,  are  accessing  and  updating  a  data  base  that  is 
managed  by  a  data  base  management  system  running  on  its  separate  virtual  machine. 
What  is  the  degradation  of  performance  with  each  additional  user?  What 
determines  the  length  of  time  the  data-base  machine  takes  to  process  a 
request?  What  is  the  best  locking  strategy? 

An  access  or  update  to  the  data-base  machine  may  be  initiated  either 
by  a  user  query,  which  would  be  |.  "^sed  on  by  the  modeling  machine,  or  by 
a  model  executing  on  the  modeling  mcchine.  In  either  case,  the  data-base 
machine. while  processing  a  request  locks  out  (queues)  all  other  requests. 
The  analysis  is  further  complicated  by  the  fact  that  as  some  VM*s  become 
locked,  then  others  get  more  of  the  real  CPU's  time,  and  therefore 
generate  reqjests  faster.  However,  the  data-base  VM  gets  more  of 
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the  CPU's  time  thereby  processing  requests  faster.     For  example,  if  there 
are  ten  virtual  machines,  each  one  receives  one-tenth  of  the  real   CPU. 
However,  if  seven  of  the  ten  are  in  a  locked  state,   then  the  remaining 
three  receive  one-third  of  the  CPU.     Thus,  these  three  run  (in  real  time) 
faster  than  they  did  when  ten  were  running. 

To  try  to  analyze  this  circumstance  for  the  uses  outlined  in  this 
paper,  we  have  assumed  that  the  virtual  speeds  of  VM's  are  constant  and 
equal.     However,  when  some  VM's  (including  the  data-base  VM)  are  allo- 
cated a  larger  share  of  CPU  processing  power,   they  become  faster  in  real 
time.     We  assume  that  each  unblocked  VM  receives  the  same   amount  of 
CPU  processing  power  and  at  the  initial  state  m  machines  are  running 
(i.e.,  the  data  base  machine  is  stopped  if  no  modeling  machines  are 
making  requests).      'X'1s  the  request  rate  of  each  modeling  VM  when  there  are 
m  VM's  running,    'v  •     Is  th^  ierylcfi  ratAJit  which  the  data  base  virtual 
machine  Is  running  when  there  are  m-1  modeling  VM  and  one  data  base  VM 
running.     Thus,  we  may  write  the  relations: 


(i  =  1.  2,   ....m) 


m 

^1  '  spprr  ^ 

,  m        A  (1*1,  2,  ....m) 

1  m-1+1 


where  i  is  the  number  of  modeling  VM's  being  blocked.  Using  a  birth/death 
process  model  [Drake,  1967],  and  using  a  queueing  analysis  [Little,  1961], 
we  get  the  following  for  the  response  time  of  the  model:  where  P^  is  the 
steady  state  probability  that  there  are  i  modeling  machines  waiting,  and 
'N'  is  the  number  of  modeling  machines. 

m-1 


I      p   /m-l\ 

212 i  I  m  j 


'  ^'''  '   ""^  m^l p^Tv 

I    p^  (;-ir  X. 

1«o 


T*    u  J  =  constant 
overhead 

m 

Z  iP. 

T'   •*  *   ^  ^.  =  N  *   i=l 
wait-for-data      


m 
1=1  ^  ^ 


total  '   overhead    model    wait-for-data 


4.  Implication  of  Theoretical  Results 

Figure  2  illustrates  the  total  time  to  exeucte  three  different 
models  as  a  function  of  the  number  of  modeling  VM's.  Let  us  consider  some 
of  the  implications  of  the  above  analysis. 
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Figure  2.  Total  Elapsed  Times  for  a  VM  Configuratic 
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First,  for  a  X/y  =  .1,  a  model  executing  in  a  configuration  of  one  modeling 
machine  takes  110  units  of  time  to  execute.  When  the  same  model,  run  in 
an  environment  of  10  modeling  machines  all  executing  similar  models,  takes 
approximately  135  units  of  time  to  execute  --  a  degradation  of  performance 
of  slightly  more  than  15  percent.  Intuitively,  X  denotes  the  speed  of  the 
modeling  machine,  and  u  is  the  speed  of  the  data  base  machine.  Thus  a 
situation  where  X/y  =  .1  indicates  that  the  data  base  machine  is  ten  times 
faster  than  the  modeling  machine.  From  the  same  figure  with  ratio  of  X/y  =  1, 
a  model  executing  with  a  configuration  of  one  modeling  machine  takes  20  units 
of  time  where  with  ten  machines  the  same  model  takes  approximately  90  units 
of  time  --  over  four  times  longer. 

If  such  a  degradation  of  performance  is  not  tollerable,  there  are 
several  ways  to  improve  performance.  The  theoretical  study  would  indicate 
that  increasing  y  for  a  given  configuration  helps  performance.  Practically 
this  could  be  done  by  changing  the  processor  scheduling  algorithm  of  VM 
so  that  the  real  processor  was  assigned  to  the  data  base  management  VM 
more  often,  thus  speeding  it  up  and  increasing  y. 

Observing  the  equation  for  T'+Qtal  ^^°^^»  another  way  of  reducing 

T'4.^4.  1  is  to  reduce  T'  „..  -  ^  ^,4.,.  One  way  to  reduce  T',„..  ^^      .  ^ 
total  wait_for_data       -^  wait_for_data 

is  to  extend  the  VM  architecture  of  Figure  1  to  allow  multiple  data  base 
machines.  In  this  configuration  T'   ..  -   .  .  could  be  reduced  by  locking 
out  all  data  base  machines  only  when  one  modeling  machine  is  doing  a  write. 
For  all  read  requests  the  multiple  data  base  machines  would  operate  without 
locking.  Shared  locks  between  machines  would  have  to  be  created  as  well 
as  a  mechanism  for  keeping  a  write  request  pending  until  all  data  base 
machines  can  be  locked. 
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A  way  of  improving  performance  further  would  be  to  extend  the  single 
locking  mechanism  used  in  the  above  multi  data  base  machine  configuration 
to  handle  multiple  locks.  Locks  would  be  associated  with  groupings  of 
data,  e.g.,  a  table.  The  locking  policy  would  be  to  have  all  machines  only 
locked  out  of  a  portion  of  the  data  when  one  machine  was  writing  into  that 
portion.  Thus  requests  could  be  processed  simultaneously  for  reads  into 
tables  not  being  written  in  and  for  reads  to  different  tables.  Thus 
adding  another  real  processor  to  the  multiple  lock  VM  configuration  could 
greatly  improve  performance.  There  is  a  tradeoff  with  theinutti locking 
scheme  between  increases  in  overhead  time  in  maintaining  multiple  locks 
versus  increases  in  wait  time  for  locked  data  bases.  We  have  not  yet 
extended  the  theoretical  analysis  to  quantify  this  tradeoff. 

Hence,  this  study  indicates  there  may  be  a  degradation  in  performance 
with  multiple  users  but  that  there  are  mechanisms  for  ameliorating  the 
effects  of  this  degradation. 

Other  theoretical  extensions  and  analyses  of  this  synchronization 
model  would  include  extending  the  model  to  cover  a  more  common  VM  operating 
circumstance  --  namely,  that  where  the  GMIS  system  (multiple  modeling 
machines  and  one  data  base  machine)  would  have  to  share  the  physical  machine 
with  other  users,  also  executing  under  VM,  e.g.,  a  payroll  program  under 
VS2  under  VM,  multiple  CMS  users,  etc. 

In  conclusion,  our  experience  with  this  approach  to  date  has  been 
very  productive.  We  feel  that  further  studies  on  cost  benefits  analysis 
and  on  increased  effectiveness  of  users  of  this  sort  of  system  will  quan- 
titatively confirm  our  observation  of  the  benefits  of  this  approach. 
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